Main

MA plot of H2Av normalization

MA plots showing ER binding before and after treatment with fulvestrant including matched Dm H2Av spike-in control.} (A) Reads corrected to total aligned reads showed the same off-centre peak density as observed in Figure 1. Putative unchanged ER binding sites are within the red triangle. (B) Overlaying the MA plot combining the changes in chromatin binding of Hs ER (black) and Dm H2Av (blue). Dm peaks overlay the off-centre peak density. (C) Utilising the Dm H2Av binding events as a ground truth for 0-fold change, a linear fit to the log-fold change is generated and the fit is applied to adjust the Hs ER binding events.

MA plots showing ER binding before and after treatment with fulvestrant including matched Dm H2Av spike-in control.} (A) Reads corrected to total aligned reads showed the same off-centre peak density as observed in Figure 1. Putative unchanged ER binding sites are within the red triangle. (B) Overlaying the MA plot combining the changes in chromatin binding of Hs ER (black) and Dm H2Av (blue). Dm peaks overlay the off-centre peak density. (C) Utilising the Dm H2Av binding events as a ground truth for 0-fold change, a linear fit to the log-fold change is generated and the fit is applied to adjust the Hs ER binding events.

RARA gene locus with CTCF Spike-in

Figure 2 can be viewed interactively on the USCS track.

H2av with DiffBind

Comparison of DiffBind output before and after applying the corrected size factors from our pipeline generated from Drosophila spike-in control. (A) Analysis of ER binding before and after treatment with fulvestrant demonstrates that DiffBind’s default normalisation strategy is more effective than the DESeq2 default, but demonstrates a bias between samples. (B) Applying the correct size factors from our DESeq2 pipeline reduces the bias in the analysis (Data: SLX-8047).

Linear model

Comparison of mean counts in CTCF peaks before and after treatment. If the samples have no systematic bias before and after treatment then the linear fit would be expected to have a gradient of 1. Here, we establish that the gradient is < 1, implying a systematic bias between samples. The read counts in the treated samples peaks are corrected (blue), removing the bias, and resulting in a new gradient of 1.

Comparison of mean counts in CTCF peaks before and after treatment. If the samples have no systematic bias before and after treatment then the linear fit would be expected to have a gradient of 1. Here, we establish that the gradient is < 1, implying a systematic bias between samples. The read counts in the treated samples peaks are corrected (blue), removing the bias, and resulting in a new gradient of 1.

Normalisation factors are consistent over a wide range in number of control binding sites

Stability of CTCF derived normalisation coefficient. Stability of the CTCF derived normalisation coefficient was analysed by sub-sampling CTCF peaks before undertaking the calculation (between 1-100% of total sites) at random. This analysis was repeated 100 times to model the variability of the result.

Comparison of CTCF and H2Av normalisation methods

Comparison of normalisation methods using consensus peak set. (A) The analysis for the CTCF normalised (blue) and H2Av normalised (green) dataset using an ER consensus peak set of 10,000 peaks were formatted as an MA plot and overlaid. This recovered the low-fold change higher-intensity peaks that were not visible in Figure 
ef{fig:ERCTCF}A and both datasets showed a similar distribution. (B) Comparison of fold-change values for individual ER binding sites between two datasets showed that the inclusion of these sites did not appear to affect the correlation (r = 0.77).

Comparison of normalisation methods using consensus peak set. (A) The analysis for the CTCF normalised (blue) and H2Av normalised (green) dataset using an ER consensus peak set of 10,000 peaks were formatted as an MA plot and overlaid. This recovered the low-fold change higher-intensity peaks that were not visible in Figure ef{fig:ERCTCF}A and both datasets showed a similar distribution. (B) Comparison of fold-change values for individual ER binding sites between two datasets showed that the inclusion of these sites did not appear to affect the correlation (r = 0.77).

Supplementry

Comparison of simple normalisation methods

Comparison of simple normalisation strategies employed. MA plots showing the changes in ER binding after 48 hours treatment with 100 nM fulvestrant. Three simple normalisation methods were applied to this data and compared to the raw count data. (A) Raw counts. (B) Reads Per Million (RPM) reads in peaks. (C) RPM aligned reads. (D) RPM total reads. Note that the highlighted peaks remain above zero under all three standard normalisations.

Comparison of simple normalisation strategies employed. MA plots showing the changes in ER binding after 48 hours treatment with 100 nM fulvestrant. Three simple normalisation methods were applied to this data and compared to the raw count data. (A) Raw counts. (B) Reads Per Million (RPM) reads in peaks. (C) RPM aligned reads. (D) RPM total reads. Note that the highlighted peaks remain above zero under all three standard normalisations.

Method comparision

Comparison of ChIP-seq Pipelines.} (A)ChIPComp data was plot from the CountSet object, results show a high number of false positive up-regulated sites. (B) EdgeR normalisation is designed for the analysis of transcriptional data. In case of large-scale uni-direction changes in binding the assumption of normalisation fail give rise distribution that is artificially symmetric.(C) DeSEQ2 makes use of similar assumptions and results in a similar distortion of data. (D) DiffBind utilises normalisation to total library size, and performs significantly better than the other three methods but does not attempt to control for systematic bias in pull-down efficiency of the ChIP.

Reproducibility plots

Correlation Plots of Replicate Experiments. (A) Scatter plots showing the correlation between the replicates with the lowest correlation value. This is provided both the control (top) and treatment (bottom) conditions. The plotted condition is highlighted with thick border in tables on the right. Colour represents density, blue = lowest, red = highest. (B) Tables showing the correlation coefficient for been each replicated.

Correlation Plots of Replicate Experiments. (A) Scatter plots showing the correlation between the replicates with the lowest correlation value. This is provided both the control (top) and treatment (bottom) conditions. The plotted condition is highlighted with thick border in tables on the right. Colour represents density, blue = lowest, red = highest. (B) Tables showing the correlation coefficient for been each replicated.

MA Plots of mouse ER normalization

MA plots showing the addition of Mm derived chromatin spike-in to the ChIP-seq analysis of MCF-7 before and after treatment with fulvestrant. (A) MA plot after scaling factor based normalisation shows same characteristic grouping of peaks off axis. (B) ER binding in Mm samples shows considerable increase in binding after treatment of the MCF-7 cell line with fulvestrant. (C) Attempting to fit a correction factor to the data results in a significant distortion.

Relative reads aligments in mouse samples

Distribution of reads for Mm chromatin spike-in normalisation strategy. Comparison of murine chromatin between samples showed no systematic bias in the sample preparation. Bar plots (left axis) represent the fraction of total aligned reads. The dot plot represents the total aligned reads (right axis) for each sample.

Distribution of reads for Mm chromatin spike-in normalisation strategy. Comparison of murine chromatin between samples showed no systematic bias in the sample preparation. Bar plots (left axis) represent the fraction of total aligned reads. The dot plot represents the total aligned reads (right axis) for each sample.

MA plots of CTCF Parallel-Factor ChIP

MA plots showing ER binding before and after treatment with fulvestrant including matched CTCF control.} (A) Reads corrected to total aligned reads showed the same off-centre peak density as observed with all that was not-normalised with an internal spike-in control. (B) Overlaying the MA plot combining the changes in chromatin binding of ER (black) and CTCF (grey). CTCF peaks overlay the off-centre peak density. (C) Utilising the CTCF binding events as a ground truth for 0-fold change, a linear fit to the log-fold change is generated (blue line). The fit is then also applied to the ER binding events.

ER and CTCF heatmaps

Clustering of samples before and after ER and CTCF peak extractions shows the effect of fulvestrant on ER peaks drive clustering of the raw data.} To confirm that the effects seen at the RARa locus were consistent across the genome, we compared the clustering of the CTCF and the ER peaks with respect to the treatment with fulvestrant. Initial clustering was weakly correlated with that of the treatment condition (A). Clustering specifically to CTCF derived peak data (B) resulted in a loss of grouping by treatment, while clustering specifically ER-derived peak data (C) led to a clearer separation by treatment.

Comparison of control regions

Comparison of the control regions used to normalise ER analysis before and after treatment. Dots highlighted in red are significant (FDR = 0.01). (A) H2Av occupancy of the Drosophila genome shows no significant changes before and after treatment. (B) The CTCF peaks used for normalisation show no significant change in the number reads before and after treatment.

Normalisation using DESeq2 SizeFactors

Normalisation of ER binding external spike implemented using DESeq2. Highlighted data points are considered significant fold-changes with a FDR = 0.01. (A) Initial analysis of the ER binding with default parameters shows an equal increase and decrease in ER binding. The distribution seen is not reflective of the documented response of ER on treatment with fulvestrant. (B) Estimating the DESeq2 size factors from the sample spike-in corrects the distortion in the results.

Normalisation of ER binding internal CTCF control. Highlighted data points are considered significant fold-changes with a FDR = 0.01. (A) Initial analysis with default DESeq2 parameters gives similar distortion as seen previously. (B) Correction using the CTCF peaks to provide an internal control allows for the data to be corrected.

Comparison of normalisation DiffBind plots

Comparison of DiffBind results before and after our two methods of normalisation. (A) Normalisation to Library Size. (B) Applying the corrected size factors from our DESeq2 pipeline generated from CTCF internal control. (C) Applying correction using linear regression of CTCF peaks between conditions to normalise data. The result is a 10.7% increase in the number of loci detected as significantly changed ER binding.

Cross normalisation

Comparison of fold-change of ER binding after both xenogenic and cross-normalisation. (A) Scatter plot of fold-change as established at individual sites by each method.  Pearson's correlation between the two methods is 0.992 (3sf, p-value tending to 0). Deviation of data points from parity is a result the integer nature of read counts, nonetheless this effect is is very small as demonstrated correlation coefficient between the two datasets. (B) Box-plot showing the fold-change of ER binding before and after treatment at ∼550 ER sites proximal to CTCF binding. The mean and maximum fold-change is reduced at these sites by Parallel-Factor ChiP, but the effect is marginal.

Comparison of fold-change of ER binding after both xenogenic and cross-normalisation. (A) Scatter plot of fold-change as established at individual sites by each method. Pearson’s correlation between the two methods is 0.992 (3sf, p-value tending to 0). Deviation of data points from parity is a result the integer nature of read counts, nonetheless this effect is is very small as demonstrated correlation coefficient between the two datasets. (B) Box-plot showing the fold-change of ER binding before and after treatment at ∼550 ER sites proximal to CTCF binding. The mean and maximum fold-change is reduced at these sites by Parallel-Factor ChiP, but the effect is marginal.

Activation of ER in MCF7

Comparison of fold-change of ER binding before and after treatment with estradiol. MA plot of ER binding after normalisation to CTCF binding displays a significant increase in ER binding at 45 minutes after treatment with estradiol.

Comparison of log(Counts) for binding sites was under taken to confirm reproducibility. The data with the lowest correlation is shown and was seen between Replicate 1 and Replicate 3 in the control condition.

Comparision of ER binding from public datasets

Comparison ER binding from public datasets.Common peaks detected for ER Ross-Innes CS, et al. 2010; Welboren WJ, et al., 2009; Ceschin DG, et al. 2011 and our data (FDR = 0.01). Venn diagram was generated with ChIPSeqAnno.

Changes in H4K12ac following E2 treatment

Comparison of fold-change of H4 acetylation (Lys12) before and after treatment with estradiol. MA plot of H4K12ac after normalisation to CTCF binding displays an increase at 2 hours after treatment with estradiol.

Comparison of log(Counts) for binding sites was under taken to confirm reproducibility. The data with the lowest correlation is shown and was seen between Replicate 2 and Repelicate 3 in the control condition.

H4k12ac occupancy profile before and after treatment with E2 shows a general increase around transcription start sites (TSS).

PDX Analysis